70 research outputs found

    On the correspondence from Bayesian log-linear modelling to logistic regression modelling with gg-priors

    Get PDF
    Consider a set of categorical variables where at least one of them is binary. The log-linear model that describes the counts in the resulting contingency table implies a specific logistic regression model, with the binary variable as the outcome. Within the Bayesian framework, the gg-prior and mixtures of gg-priors are commonly assigned to the parameters of a generalized linear model. We prove that assigning a gg-prior (or a mixture of gg-priors) to the parameters of a certain log-linear model designates a gg-prior (or a mixture of gg-priors) on the parameters of the corresponding logistic regression. By deriving an asymptotic result, and with numerical illustrations, we demonstrate that when a gg-prior is adopted, this correspondence extends to the posterior distribution of the model parameters. Thus, it is valid to translate inferences from fitting a log-linear model to inferences within the logistic regression framework, with regard to the presence of main effects and interaction terms.Comment: 27 page

    Exploring dependence between categorical variables: benefits and limitations of using variable selection within Bayesian clustering in relation to log-linear modelling with interaction terms

    Get PDF
    This manuscript is concerned with relating two approaches that can be used to explore complex dependence structures between categorical variables, namely Bayesian partitioning of the covariate space incorporating a variable selection procedure that highlights the covariates that drive the clustering, and log-linear modelling with interaction terms. We derive theoretical results on this relation and discuss if they can be employed to assist log-linear model determination, demonstrating advantages and limitations with simulated and real data sets. The main advantage concerns sparse contingency tables. Inferences from clustering can potentially reduce the number of covariates considered and, subsequently, the number of competing log-linear models, making the exploration of the model space feasible. Variable selection within clustering can inform on marginal independence in general, thus allowing for a more efficient exploration of the log-linear model space. However, we show that the clustering structure is not informative on the existence of interactions in a consistent manner. This work is of interest to those who utilize log-linear models, as well as practitioners such as epidemiologists that use clustering models to reduce the dimensionality in the data and to reveal interesting patterns on how covariates combine.Comment: Preprin

    On the correspondence of deviances and maximum-likelihood and interval estimates from log-linear to logistic regression modelling

    Get PDF
    Funding: The first author would like to acknowledge the support of the School of Mathematics and Statistics, as well as CREEM, at the University of St Andrews, and the University of St Andrews St Leonard’s 7th Century Scholarship.Consider a set of categorical variables P where at least one, denoted by Y, is binary. The log-linear model that describes the contingency table counts implies a logistic regression model, with outcome Y. Extending results from Christensen (1997, Log-linear models and logistic regression, 2nd edn. New York, NY, Springer), we prove that the maximum-likelihood estimates (MLE) of the logistic regression parameters equals the MLE for the corresponding log-linear model parameters, also considering the case where contingency table factors are not present in the corresponding logistic regression and some of the contingency table cells are collapsed together. We prove that, asymptotically, standard errors are also equal. These results demonstrate the extent to which inferences from the log-linear framework translate to inferences within the logistic regression framework, on the magnitude of main effects and interactions. Finally, we prove that the deviance of the log-linear model is equal to the deviance of the corresponding logistic regression, provided that no cell observations are collapsed together when one or more factors in P∖{Y} become obsolete. We illustrate the derived results with the analysis of a real dataset.Publisher PDFPeer reviewe

    Parameter redundancy and the existence of the maximum likelihood estimates in log-linear models

    Get PDF
    The work of first author is supported by EPSRC PhD grants EP/J500549/1, EP/K503162/1 and EP/L505079/1.Log-linear models are typically fitted to contingency table data to describe and identify the relationship between different categorical variables. However, the data may include observed zero cell entries. The presence of zero cell entries can have an adverse effect on the estimability of parameters, due to parameter redundancy. We describe a general approach for determining whether a given log-linear model is parameter redundant for a pattern of observed zeros inthe table, prior to fitting the model to the data. We derive the estimable parameters or functions of parameters and also explain how to reduce the unidentifiable model to an identifiable one. Parameter redundant models have a flat ridge in their likelihood function. We further explain when this ridge imposes some additional parameter constraints on the model, which can lead to obtaining unique maximum likelihood estimates for parameters that otherwise would not have been estimable. In contrast to other frameworks, the proposed novel approach informs on those constraints, elucidating the model that is actually being fitted.PostprintPostprintPeer reviewe

    Introducing a real-time interactive GUI tool for visualization of galaxy spectra

    Get PDF
    To aid the understanding of the nonlinear relationship between galaxy properties and predicted spectral energy distributions (SED), we present a new interactive graphical user interface tool pipes_vis based on Bagpipes. It allows for real-time manipulation of a model galaxy's star formation history, dust and other relevant properties through sliders and text boxes, with each change's effect on the predicted SED reflected instantaneously. We hope the tool will assist in building intuition about what affects the SED of galaxies, potentially helping to speed up fitting stages such as prior construction, and aid in undergraduate and graduate teaching. pipes_vis is available online (pipes_vis is maintained and documented online at https://github.com/HinLeung622/pipes_vis, or version 0.4.1 is archived in Zenodo (Leung 2021) and also available for installation through pip install pipes_vis).PostprintNon peer reviewe

    Common Methodological Challenges Encountered With Multiple Systems Estimation Studies

    Get PDF
    Multiple systems estimation refers to a class of inference procedures that are commonly used to estimate the size of hidden populations based on administrative lists. In this paper we discuss some of the common challenges encountered in such studies. In particular, we summarize theoretical issues relating to the existence of maximum likelihood estimators, model identifiability, and parameter redundancy when there is sparse overlap among the lists. We also discuss techniques for matching records when there are no unique identifiers, exploiting covariate information to improve estimation, and addressing missing data. We offer suggestions for remedial actions when these issues/challenges manifest. The corresponding R coding packages that can assist with the analyses of multiple systems estimation data sets are also discussed.PostprintPeer reviewe

    The 10-year follow-up of a community-based cohort of people with diabetes : the incidence of foot ulceration and death

    Get PDF
    Funding: This work was funded as part of a wider project by the National Institute for Health Research (NIHR) Health Technology Assessment (HTA) Programme (HTA project: 15/171/01).Background: Identifying people with diabetes who are likely to experience a foot ulcer is an important part of preventative care. Many cohort studies report predictive models for foot ulcerations and for people with diabetes, but reports of long-term outcomes are scarce. Aim: We aimed to develop a predictive model for foot ulceration in diabetes using a range of potential risk factors with a follow-up of 10 years after recruitment. A new foot ulceration was the outcome of interest and death was the secondary outcome of interest. Design: A 10-year follow-up cohort study. Methods:  1193 people with a diagnosis of diabetes who took part in a study in 2006–2007 were invited to participate in a 10-year follow-up. We developed a prognostic model for the incidence of incident foot ulcerations using a survival analysis, Cox proportional hazards model. We also utilised survival analysis Kaplan–Meier curves, and relevant tests, to assess the association between the predictor variables for foot ulceration and death. Results: At 10-year follow-up, 41% of the original study population had died and more than 18% had developed a foot ulcer. The predictive factors for foot ulceration were an inability to feel a 10 g monofilament or vibration from a tuning fork, previous foot ulceration and duration of diabetes.  Conclusions: The prognostic model shows an increased risk of ulceration for those with previous history of foot ulcerations, insensitivity to a 10 g monofilament, a tuning fork and duration of diabetes. The incidence of foot ulceration at 10-year follow-up was 18%; however, the risk of death for this community-based population was far greater than the risk of foot ulceration.Publisher PDFPeer reviewe

    Examining the Joint Effect of Multiple Risk Factors Using Exposure Risk Profiles: Lung Cancer in Nonsmokers

    Get PDF
    Bac k g r o u n d: Profile regression is a Bayesian statistical approach designed for investigating the joint effect of multiple risk factors. It reduces dimensionality by using as its main unit of inference the exposure profiles of the subjects that is, the sequence of covariate values that correspond to each subject. Objectives: We applied profile regression to a case–control study of lung cancer in nonsmokers, nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort, to estimate the combined effect of environmental carcinogens and to explore possible gene–environment interactions. Me t h o d s: We tailored and extended the profile regression approach to the analysis of case–control studies, allowing for the analysis of ordinal data and the computation of posterior odds ratios. We compared and contrasted our results with those obtained using standard logistic regression and classification tree methods, including multifactor dimensionality reduction. Res u l t s: Profile regression strengthened previous observations in other study populations on the role of air pollutants, particularly particulate matter ≤ 10 μm in aerodynamic diameter (PM 10), in lung cancer for nonsmokers. Covariates including living on a main road, exposure to PM 10 and nitrogen dioxide, and carrying out manual work characterized high-risk subject profiles. Such combinations of risk factors were consistent with a priori expectations. In contrast, other methods gave less interpretable results. Con c l u s i o n s: We conclude that profile regression is a powerful tool for identifying risk profiles that express the joint effect of etiologically relevant variables in multifactorial diseases. Key w o r d s: air pollutants, Bayesian inference, clustering, combined effect, gene–environment interactions. Environ Health Perspect 119:84–91 (2011). doi:10.1289/ehp.1002118 [Onlin
    • …
    corecore